Centroid estimation based on symmetric KL di- vergence for Multinomial text classification prob- lem

نویسندگان

  • Jiangning Chen
  • John Dever
  • Rundong Du
چکیده

We define a new centroid estimator for text classification based on the KLdivergence of the classes. The score favors documents that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on several standard data sets indicate that the new method outperforms better than traditional Naive Bayes classifier, especially for larger training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Feature Selection Score for Multinomial Naive Bayes Text Classification Based on KL-Divergence

We define a new feature selection score for text classification based on the KL-divergence between the distribution of words in training documents and their classes. The score favors words that have a similar distribution in documents of the same class but different distributions in documents of different classes. Experiments on two standard data sets indicate that the new method outperforms mu...

متن کامل

Probabilistic Hierarchical Clustering Method for Organizing Collections of Text Documents

In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and sy...

متن کامل

A Probabilistic Hierarchical Clustering Method for Organising Collections of Text Documents

In this paper a generic probabilistic framework for the unsupervised hierarchical clustering of large-scale sparse high-dimensional data collections is proposed. The framework is based on a hierarchical probabilistic mixture methodology. Two classes of models emerge from the analysis and these have been termed as symmetric and asymmetric models. For text data specifically both asymmetric and sy...

متن کامل

Large margin multinomial mixture model for text categorization

In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...

متن کامل

An Estimation of Required Rotational Torque to Operate Horizontal Directional Drilling Using Rock Engineering Systems

Horizontal directional drilling (HDD) is widely used in soil and rock engineering. In a variety of conditions, it is necessary to estimate the torque required for performing the reaming operation. Nevertheless, there is not presently a convenient method to accomplish this task. In this paper, to overcome this difficulty based on the basic concepts of rock engineering systems (RES), a model for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018